Visualization of Big Spatial Data using Coresets for Kernel Density Estimates

نویسندگان

  • Yan Zheng
  • Yi Ou
  • Alexander Lex
  • Jeff M. Phillips
چکیده

The size of large, geo-located datasets has reached scales where visualization of all data points is inefficient. Random sampling is a method to reduce the size of a dataset, yet it can introduce unwanted errors. We describe a method for subsampling of spatial data suitable for creating kernel density estimates from very large data and demonstrate that it results in less error than random sampling. We also introduce a method to ensure that thresholding of low values based on sampled data does not omit any regions above the desired threshold when working with sampled data. We demonstrate the effectiveness of our approach using both, artificial and real-world large geospatial datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Coresets for Kernel Density Estimates

We study the construction of coresets for kernel density estimates. That is we show how to approximate the kernel density estimate described by a large point set with another kernel density estimate with a much smaller point set. For characteristic kernels (including Gaussian and Laplace kernels), our approximation preserves the L∞ error between kernel density estimates within error ε, with cor...

متن کامل

Near-Optimal Coresets of Kernel Density Estimates

We construct near-optimal coresets for kernel density estimate for points in Rd when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size O( √ d log(1/ε)/ε), and we show a near-matching lower bound of size Ω( √ d/ε). The upper bound is a polynomial in 1/ε improvement when d ∈ [3, 1/ε2) (for all kernels except the Gaussian kernel which had a ...

متن کامل

Small and Stable Descriptors of Distributions for Geometric Statistical Problems

This thesis explores how to sparsely represent distributions of points for geometric statistical problems. A coreset C is a small summary of a point set P such that if a certain statistic is computed on P and C, then the difference in the results is guaranteed to be bounded by a parameter ε. Two examples of coresets are εsamples and ε-kernels. An ε-sample can estimate the density of a point set...

متن کامل

Spatio-Temporal Big Data Analytics for Environmental Health

The framework for our proposed big data analytics platform is shown in Figure 1. Two complimentary systems support the wide variety of spatial analytics algorithms and techniques we are providing. On the left half of Figure 1, the more-traditional unix filesystem supports high-throughput computation (e.g., MPI [Snir et al., 1995], OpenMP [Dagum and Menon, 1998], GPGPU/CUDA Luebke et al. [2006])...

متن کامل

Determining Effective Factors on Land Surface Temperature of Tehran Using LANDSAT Images And Integrating Geographically Weighted Regression With Genetic Algorithm

Due to urbanization and changes in the urban thermal environment and since the land surface temperature (LST) in urban areas are a few degrees higher than in surrounding non-urbanized areas, identifying spatial factors affecting on LST in urban areas is very important. Hence, by identifying these factors, preventing this phenomenon become possible using general education, inserting rules and al...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.04453  شماره 

صفحات  -

تاریخ انتشار 2017